Intro to raster data

In this part of the workshop we will talk about raster data. We will start with an introduction of the fundamental principles, packages, and metadata needed to work with raster data in R. We will discuss some of the most important metadata elements, including CRS and resolution. We continue to work with the tidyverse and here packages and we will use two additional packages to work with raster data: raster and rgdal.

The dataset we chose is a set of GeoTIFF files on Delft or subdivisions of it. The files were included in the archive you downloaded before the workshop.

View Raster File Attributes

The GeoTIFF format contains a set of embedded tags with metadata about the raster data. We can use the function GDALinfo() from the rgdal package to get information about our raster data before we read that data into R. It is recommended to do this before importing your data.

GDALinfo(here("data","tud-dsm-5m.tif"))
## rows        386 
## columns     722 
## bands       1 
## lower left origin.x        83565 
## lower left origin.y        445250 
## res.x       5 
## res.y       5 
## ysign       -1 
## oblique.x   0 
## oblique.y   0 
## driver      GTiff 
## projection  +proj=sterea +lat_0=52.1561605555556 +lon_0=5.38763888888889
## +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +units=m +no_defs 
## file        /Users/claudiuforgaci/Projects/geospatial-data-carpentry-tud-2023-06/data/tud-dsm-5m.tif 
## apparent band summary:
##    GDType hasNoDataValue NoDataValue blockSize1 blockSize2
## 1 Float32          FALSE           0          2        722
## apparent band statistics:
##   Bmin Bmax Bmean Bsd
## 1    0    0     0   0
## Metadata:
## AREA_OR_POINT=Area

Open a Raster in R

Now that we’ve previewed the metadata for our GeoTIFF, let’s import this raster dataset. We are going to work with a Digital Surface Model (DSM) which is in the GeoTIFF format. We can use the raster() function to import a raster file.

DSM_TUD <- raster(here("data","tud-dsm-5m.tif"))
DSM_TUD
## class      : RasterLayer 
## dimensions : 386, 722, 278692  (nrow, ncol, ncell)
## resolution : 5, 5  (x, y)
## extent     : 83565, 87175, 445250, 447180  (xmin, xmax, ymin, ymax)
## crs        : +proj=sterea +lat_0=52.1561605555556 +lon_0=5.38763888888889 +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +units=m +no_defs 
## source     : tud-dsm-5m.tif 
## names      : tud.dsm.5m 
## values     : 0, 0  (min, max)

Similar to other data structures in R like vectors and data frame columns, descriptive statistics for raster data can be retrieved with the summary() function.

summary(DSM_TUD)
##         tud.dsm.5m
## Min.    -5.2235179
## 1st Qu. -0.6958068
## Median   0.5665370
## 3rd Qu.  4.4967444
## Max.    89.7837830
## NA's     0.0000000

But note the warning. Unless you force R to calculate these statistics using every cell, it will take a random sample of 100,000 cells and calculate from them instead. Now, raster objects are not data frames so you cannot count the cells with nrow(), but with the ncell() function of the raster package.

summary(DSM_TUD, maxsamp = ncell(DSM_TUD))
##         tud.dsm.5m
## Min.    -5.3906898
## 1st Qu. -0.7008305
## Median   0.5572625
## 3rd Qu.  4.4648066
## Max.    92.0810165
## NA's     0.0000000

To visualise raster data in ggplot2, we will need to convert it to a data frame. The raster package has a as.data.frame function for that purpose.

DSM_TUD_df <- as.data.frame(DSM_TUD, xy = TRUE)

Now when we view the structure of our data, we will see a standard data frame format.

str(DSM_TUD_df)
## 'data.frame':    278692 obs. of  3 variables:
##  $ x         : num  83568 83572 83578 83582 83588 ...
##  $ y         : num  447178 447178 447178 447178 447178 ...
##  $ tud.dsm.5m: num  10.34 8.64 1.25 1.12 2.13 ...

We can use ggplot() to plot this data with another geom_ function called geom_raster(). The coord_quickmap() gives a quick approximation that preserves straight lines. This approximation is suitable for small areas that are not too close to the poles.

ggplot() +
    geom_raster(data = DSM_TUD_df , aes(x = x, y = y, fill = tud.dsm.5m)) +
    scale_fill_viridis_c() +  # remember, this color palette was introduced in the first lesson
    coord_quickmap() 

This map, a so-called Digital Surface Model, shows the elevation of our study site, including buildings and vegetation.

For faster previews, you can use the plot() function.

plot(DSM_TUD)

But what units are these? This information is specified in the CRS, so let’s have a closer look at the CRS of our raster.

View Raster Coordinate Reference System (CRS) in R

With raster objects we use the crs() function from the raster package.

crs(DSM_TUD)
## Coordinate Reference System:
## Deprecated Proj.4 representation:
##  +proj=sterea +lat_0=52.1561605555556 +lon_0=5.38763888888889
## +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +units=m +no_defs 
## WKT2 2019 representation:
## PROJCRS["unknown",
##     BASEGEOGCRS["unknown",
##         DATUM["Unknown based on Bessel 1841 ellipsoid",
##             ELLIPSOID["Bessel 1841",6377397.155,299.1528128,
##                 LENGTHUNIT["metre",1,
##                     ID["EPSG",9001]]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8901]]],
##     CONVERSION["unknown",
##         METHOD["Oblique Stereographic",
##             ID["EPSG",9809]],
##         PARAMETER["Latitude of natural origin",52.1561605555556,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",5.38763888888889,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.9999079,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",155000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",463000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]]

We see that the units are in meters in the Proj4 string: +units=m.

Calculate the Min and Max value

It is useful to know the minimum and maximum values of a raster dataset. In the case of a DSM, those values represent the min/max elevation range of our site.

minValue(DSM_TUD)
## [1] 0
maxValue(DSM_TUD)
## [1] 0

If Min and Max values haven’t been calculated, you can set them with the raster::setMinMax() function.

DSM_TUD <- raster::setMinMax(DSM_TUD)

Raster bands

To see how many bands a raster dataset has, use the raster::nlayers() function.

nlayers(DSM_TUD)
## [1] 1

This dataset has only 1 band. We will discuss multi-band raster data in a later episode.

Creating a histogram of raster values

A histogram can be used to inspect the distribution of raster values visually. It can show if there are values above the max or below the min of the expected range. We can create a histogram with the ggplot2 function geom_histogram().

ggplot() +
  geom_histogram(data = DSM_TUD_df, aes(tud.dsm.5m))

Adjust the level of desired detail by setting the number of bins.

ggplot() +
  geom_histogram(data = DSM_TUD_df, aes(tud.dsm.5m), bins = 40)

Looking at the distribution can help us identify bad data values, that is, values that are out of the min/max range.

Challenge 1 (2 minutes)

Use GDALinfo() to determine the following about the tud-dsm-hill.tif file:

  1. Does this file have the same CRS as DSM_TUD?
  2. What is resolution of the raster data?
  3. How large would a 5x5 pixel area be on the Earth’s surface?
  4. Is the file a multi- or single-band raster?
GDALinfo(here("data","tud-dsm-5m-hill.tif"))
## rows        386 
## columns     722 
## bands       1 
## lower left origin.x        83565 
## lower left origin.y        445250 
## res.x       5 
## res.y       5 
## ysign       -1 
## oblique.x   0 
## oblique.y   0 
## driver      GTiff 
## projection  +proj=sterea +lat_0=52.1561605555556 +lon_0=5.38763888888889
## +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +units=m +no_defs 
## file        /Users/claudiuforgaci/Projects/geospatial-data-carpentry-tud-2023-06/data/tud-dsm-5m-hill.tif 
## apparent band summary:
##   GDType hasNoDataValue NoDataValue blockSize1 blockSize2
## 1   Byte           TRUE           0         11        722
## apparent band statistics:
##   Bmin Bmax Bmean Bsd
## 1    0  255    NA  NA
## Metadata:
## AREA_OR_POINT=Area

Plot raster data

In this part we will plot our raster object using ggplot2 with customized coloring schemes. In the previous plot, our DSM was colored with a continuous colour range. For clarity and visibility, we may prefer to view the data “symbolized” or colored according to ranges of values. This is comparable to a “classified” map. For that, we need to tell ggplot how many groups to break our data into and where those breaks should be. To make these decisions, it is useful to first explore the distribution of the data using a bar plot. To begin with, we will use dplyr’s mutate() function combined with cut() to split the data into 3 bins.

DSM_TUD_df <- DSM_TUD_df %>%
  mutate(fct_elevation = cut(tud.dsm.5m, breaks = 3))

ggplot() +
    geom_bar(data = DSM_TUD_df, aes(fct_elevation))

To see the cutoff values:

unique(DSM_TUD_df$fct_elevation)
## [1] (-5.49,27.1] (27.1,59.6]  (59.6,92.2] 
## Levels: (-5.49,27.1] (27.1,59.6] (59.6,92.2]

To show count number of pixels in each group:

DSM_TUD_df %>%
        group_by(fct_elevation) %>%
        count()
## # A tibble: 3 × 2
## # Groups:   fct_elevation [3]
##   fct_elevation      n
##   <fct>          <int>
## 1 (-5.49,27.1]  277100
## 2 (27.1,59.6]     1469
## 3 (59.6,92.2]      123

To customize cutoff values:

custom_bins <- c(-10, 0, 5, 100)

head(DSM_TUD_df)
##         x        y tud.dsm.5m fct_elevation
## 1 83567.5 447177.5  10.343450  (-5.49,27.1]
## 2 83572.5 447177.5   8.637954  (-5.49,27.1]
## 3 83577.5 447177.5   1.250230  (-5.49,27.1]
## 4 83582.5 447177.5   1.124690  (-5.49,27.1]
## 5 83587.5 447177.5   2.130767  (-5.49,27.1]
## 6 83592.5 447177.5   3.377968  (-5.49,27.1]
DSM_TUD_df <- DSM_TUD_df %>%
  mutate(fct_elevation_cb = cut(tud.dsm.5m, breaks = custom_bins))

head(DSM_TUD_df)
##         x        y tud.dsm.5m fct_elevation fct_elevation_cb
## 1 83567.5 447177.5  10.343450  (-5.49,27.1]          (5,100]
## 2 83572.5 447177.5   8.637954  (-5.49,27.1]          (5,100]
## 3 83577.5 447177.5   1.250230  (-5.49,27.1]            (0,5]
## 4 83582.5 447177.5   1.124690  (-5.49,27.1]            (0,5]
## 5 83587.5 447177.5   2.130767  (-5.49,27.1]            (0,5]
## 6 83592.5 447177.5   3.377968  (-5.49,27.1]            (0,5]
unique(DSM_TUD_df$fct_elevation_cb)
## [1] (5,100] (0,5]   (-10,0]
## Levels: (-10,0] (0,5] (5,100]
ggplot() +
  geom_bar(data = DSM_TUD_df, aes(fct_elevation_cb))

DSM_TUD_df %>%
  group_by(fct_elevation_cb) %>%
  count()
## # A tibble: 3 × 2
## # Groups:   fct_elevation_cb [3]
##   fct_elevation_cb      n
##   <fct>             <int>
## 1 (-10,0]          113877
## 2 (0,5]            101446
## 3 (5,100]           63369
ggplot() +
  geom_raster(data = DSM_TUD_df , aes(x = x, y = y, fill = fct_elevation_cb)) + 
  coord_quickmap()

The plot above uses the default colors inside ggplot for raster objects. We can specify our own colors to make the plot look a little nicer. R has a built in set of colors for plotting terrain, which are built in to the terrain.colors() function. Since we have three bins, we want to create a 3-color palette:

terrain.colors(3)
## [1] "#00A600" "#ECB176" "#F2F2F2"
ggplot() +
 geom_raster(data = DSM_TUD_df , aes(x = x, y = y,
                                      fill = fct_elevation_cb)) + 
    scale_fill_manual(values = terrain.colors(3)) + 
    coord_quickmap()

my_col <- terrain.colors(3)

ggplot() +
 geom_raster(data = DSM_TUD_df , aes(x = x, y = y,
                                      fill = fct_elevation_cb)) + 
    scale_fill_manual(values = my_col, name = "Elevation") + 
    coord_quickmap()

Challenge 2 (5 minutes)

Create a plot of the TU Delft Digital Surface Model (DSM_TUD) that has:

  1. Six classified ranges of values (break points) that are evenly divided among the range of pixel values.
  2. Axis labels.
  3. A plot title.
DSM_TUD_df <- DSM_TUD_df %>%
  mutate(fct_elevation_6 = cut(tud.dsm.5m, breaks = 6))

unique(DSM_TUD_df$fct_elevation_6)
## [1] (-5.49,10.9] (10.9,27.1]  (27.1,43.3]  (43.3,59.6]  (59.6,75.8] 
## [6] (75.8,92.2] 
## 6 Levels: (-5.49,10.9] (10.9,27.1] (27.1,43.3] (43.3,59.6] ... (75.8,92.2]
my_col <- terrain.colors(6)

ggplot() +
  geom_raster(data = DSM_TUD_df, aes(x = x, y = y,
                                       fill = fct_elevation_6)) +
  scale_fill_manual(values = my_col, name = "Elevation") +
  coord_quickmap() +
  xlab("X") +
  ylab("Y") +
  labs(title = "Elevation Classes of the Digital Surface Model (DSM)")

Layering rasters

We can layer a raster on top of a hillshade raster for the same area, and use a transparency factor to create a 3-dimensional shaded effect. A hillshade is a raster that maps the shadows and texture that you would see from above when viewing terrain. We will add a custom color, making the plot grey.

DSM_hill_TUD <- raster(here("data","tud-dsm-5m-hill.tif"))
DSM_hill_TUD
## class      : RasterLayer 
## dimensions : 386, 722, 278692  (nrow, ncol, ncell)
## resolution : 5, 5  (x, y)
## extent     : 83565, 87175, 445250, 447180  (xmin, xmax, ymin, ymax)
## crs        : +proj=sterea +lat_0=52.1561605555556 +lon_0=5.38763888888889 +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +units=m +no_defs 
## source     : tud-dsm-5m-hill.tif 
## names      : tud.dsm.5m.hill
DSM_hill_TUD_df <- as.data.frame(DSM_hill_TUD, xy = TRUE)
str(DSM_hill_TUD_df)
## 'data.frame':    278692 obs. of  3 variables:
##  $ x              : num  83568 83572 83578 83582 83588 ...
##  $ y              : num  447178 447178 447178 447178 447178 ...
##  $ tud.dsm.5m.hill: num  NA NA NA NA NA NA NA NA NA NA ...
ggplot() +
  geom_raster(data = DSM_hill_TUD_df,
              aes(x = x, y = y, alpha = tud.dsm.5m.hill)) + 
  scale_alpha(range =  c(0.15, 0.65), guide = "none") + 
  coord_quickmap()

We can layer another raster on top of our hillshade by adding another call to the geom_raster() function. Let’s overlay DSM_TUD on top of the DSM_hill_TUD.

ggplot() +
  geom_raster(data = DSM_TUD_df , 
              aes(x = x, y = y, 
                  fill = tud.dsm.5m)) + 
  geom_raster(data = DSM_hill_TUD_df, 
              aes(x = x, y = y, 
                  alpha = tud.dsm.5m.hill)) +  
  scale_fill_viridis_c() +  
  scale_alpha(range = c(0.15, 0.65), guide = "none") +  
  ggtitle("Elevation with hillshade") +
  coord_quickmap()

Challenge 3 (8 minutes)

Use the tud-dtm.tif and tud-dtm-hill.tif files from the data directory to create a Digital Terrain Model map of the TU Delft area.

Make sure to:

  • include hillshade in the maps,
  • label axes,
  • include a title for each map,
  • experiment with various alpha values and color palettes to represent the data.
# import DTM
DTM_TUD <- raster(here("data","tud-dtm-5m.tif"))
DTM_TUD_df <- as.data.frame(DTM_TUD, xy = TRUE)

# DTM Hillshade
DTM_hill_TUD <- raster(here("data","tud-dtm-5m-hill.tif"))
DTM_hill_TUD_df <- as.data.frame(DTM_hill_TUD, xy = TRUE)

ggplot() +
    geom_raster(data = DTM_TUD_df ,
                aes(x = x, y = y,
                     fill = tud.dtm.5m,
                     alpha = 2.0)
                ) +
    geom_raster(data = DTM_hill_TUD_df,
                aes(x = x, y = y,
                  alpha = tud.dtm.5m.hill)
                ) +
    scale_fill_viridis_c() +
    guides(fill = guide_colorbar()) +
    scale_alpha(range = c(0.4, 0.7), guide = "none") +
    theme_bw() +
    theme(panel.grid.major = element_blank(), 
          panel.grid.minor = element_blank()) +
    theme(axis.title.x = element_blank(),
          axis.title.y = element_blank()) +
    ggtitle("DTM with Hillshade") +
    coord_quickmap()

Reproject raster data

What happens when maps don’t line up? That is usually a sign that layers are in different coordinate reference systems (CRS).

In this episode, we will be working with the digital terrain model.

DTM_TUD <- raster(here("data","tud-dtm-5m.tif"))
DTM_hill_TUD <- raster(here("data","tud-dtm-5m-hill-ETRS89.tif"))
DTM_TUD_df <- as.data.frame(DTM_TUD, xy = TRUE)
DTM_hill_TUD_df <- as.data.frame(DTM_hill_TUD, xy = TRUE)
ggplot() +
     geom_raster(data = DTM_TUD_df , 
                 aes(x = x, y = y, 
                  fill = tud.dtm.5m)) + 
     geom_raster(data = DTM_hill_TUD_df, 
                 aes(x = x, y = y, 
                   alpha = tud.dtm.5m.hill.ETRS89)) +
     scale_fill_gradientn(name = "Elevation", colors = terrain.colors(10)) +
     coord_quickmap()

Our results are curious - neither the Digital Terrain Model (DTM_TUD_df) nor the DTM Hillshade (DTM_hill_TUD_df) plotted. Let’s try to plot the DTM on its own to make sure there are data there.

ggplot() +
  geom_raster(data = DTM_TUD_df,
              aes(x = x, y = y,
                  fill = tud.dtm.5m)) +
  scale_fill_gradientn(name = "Elevation", colors = terrain.colors(10)) +
  coord_quickmap()

ggplot() +
  geom_raster(data = DTM_hill_TUD_df,
              aes(x = x, y = y,
                  alpha = tud.dtm.5m.hill.ETRS89)) +
  coord_quickmap()

If we look at the axes, we can see that the projections of the two rasters are different.

Challenge 4 (2 minutes)

View the CRS for each of these two datasets. What projection does each use?

crs(DTM_TUD)
## Coordinate Reference System:
## Deprecated Proj.4 representation:
##  +proj=sterea +lat_0=52.1561605555556 +lon_0=5.38763888888889
## +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +units=m +no_defs 
## WKT2 2019 representation:
## PROJCRS["unknown",
##     BASEGEOGCRS["unknown",
##         DATUM["Unknown based on Bessel 1841 ellipsoid",
##             ELLIPSOID["Bessel 1841",6377397.155,299.1528128,
##                 LENGTHUNIT["metre",1,
##                     ID["EPSG",9001]]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8901]]],
##     CONVERSION["unknown",
##         METHOD["Oblique Stereographic",
##             ID["EPSG",9809]],
##         PARAMETER["Latitude of natural origin",52.1561605555556,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",5.38763888888889,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.9999079,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",155000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",463000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]]
crs(DTM_hill_TUD)
## Coordinate Reference System:
## Deprecated Proj.4 representation:
##  +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80
## +units=m +no_defs 
## WKT2 2019 representation:
## PROJCRS["unknown",
##     BASEGEOGCRS["unknown",
##         DATUM["Unknown based on GRS80 ellipsoid",
##             ELLIPSOID["GRS 1980",6378137,298.257222101,
##                 LENGTHUNIT["metre",1],
##                 ID["EPSG",7019]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8901]]],
##     CONVERSION["unknown",
##         METHOD["Lambert Azimuthal Equal Area",
##             ID["EPSG",9820]],
##         PARAMETER["Latitude of natural origin",52,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",10,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["False easting",4321000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",3210000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]]

Reproject rasters

We can use the projectRaster() function to reproject a raster into a new CRS. Keep in mind that reprojection only works when you first have a defined CRS for the raster object that you want to reproject. It cannot be used if no CRS is defined.

DTM_hill_EPSG28992_TUD <- projectRaster(DTM_hill_TUD,
                                       crs = crs(DTM_TUD))
crs(DTM_hill_EPSG28992_TUD)
## Coordinate Reference System:
## Deprecated Proj.4 representation:
##  +proj=sterea +lat_0=52.1561605555556 +lon_0=5.38763888888889
## +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +units=m +no_defs 
## WKT2 2019 representation:
## PROJCRS["unknown",
##     BASEGEOGCRS["unknown",
##         DATUM["Unknown based on Bessel 1841 ellipsoid",
##             ELLIPSOID["Bessel 1841",6377397.155,299.1528128,
##                 LENGTHUNIT["metre",1,
##                     ID["EPSG",9001]]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8901]]],
##     CONVERSION["unknown",
##         METHOD["Oblique Stereographic",
##             ID["EPSG",9809]],
##         PARAMETER["Latitude of natural origin",52.1561605555556,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",5.38763888888889,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.9999079,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",155000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",463000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]]
crs(DTM_hill_TUD)
## Coordinate Reference System:
## Deprecated Proj.4 representation:
##  +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80
## +units=m +no_defs 
## WKT2 2019 representation:
## PROJCRS["unknown",
##     BASEGEOGCRS["unknown",
##         DATUM["Unknown based on GRS80 ellipsoid",
##             ELLIPSOID["GRS 1980",6378137,298.257222101,
##                 LENGTHUNIT["metre",1],
##                 ID["EPSG",7019]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8901]]],
##     CONVERSION["unknown",
##         METHOD["Lambert Azimuthal Equal Area",
##             ID["EPSG",9820]],
##         PARAMETER["Latitude of natural origin",52,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",10,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["False easting",4321000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",3210000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]]
extent(DTM_hill_EPSG28992_TUD)
## class      : Extent 
## xmin       : 83395.44 
## xmax       : 87301.54 
## ymin       : 444886.2 
## ymax       : 447328
extent(DTM_hill_TUD)
## class      : Extent 
## xmin       : 3933146 
## xmax       : 3936870 
## ymin       : 3223825 
## ymax       : 3225980

Question

Why do you think the two extents differ?

Dealing with raster resolution

res(DTM_hill_EPSG28992_TUD)
## [1] 5.30 4.66
res(DTM_TUD)
## [1] 5 5

These two resolutions are different, but they’re representing the same data. We can tell R to force our newly reprojected raster to be the same as DTM_TUD by using res(DTM_TUD).

DTM_hill_EPSG28992_TUD <- projectRaster(DTM_hill_TUD,
                                         crs = crs(DTM_TUD),
                                         res = res(DTM_TUD))
res(DTM_hill_EPSG28992_TUD)
## [1] 5 5
res(DTM_TUD)
## [1] 5 5
DTM_hill_TUD_2_df <- as.data.frame(DTM_hill_EPSG28992_TUD, xy = TRUE)
ggplot() +
     geom_raster(data = DTM_TUD_df , 
                 aes(x = x, y = y, 
                  fill = tud.dtm.5m)) + 
     geom_raster(data = DTM_hill_TUD_2_df, 
                 aes(x = x, y = y, 
                   alpha = tud.dtm.5m.hill.ETRS89)) +
     scale_fill_gradientn(name = "Elevation", colors = terrain.colors(10)) + 
     coord_quickmap()

Raster calculations

We often want to perform calculations on two or more rasters to create a new output raster. For example, if we are interested in mapping the heights of trees and buildings across an entire field site, we might want to calculate the difference between the Digital Surface Model (DSM, tops of trees and buildings) and the Digital Terrain Model (DTM, ground level). The resulting dataset is referred to as a Canopy Height Model (CHM) and represents the actual height of trees, buildings, etc. with the influence of ground elevation removed.

Raster calculations in R

GDALinfo(here("data","tud-dtm-5m.tif"))
## rows        386 
## columns     722 
## bands       1 
## lower left origin.x        83565 
## lower left origin.y        445250 
## res.x       5 
## res.y       5 
## ysign       -1 
## oblique.x   0 
## oblique.y   0 
## driver      GTiff 
## projection  +proj=sterea +lat_0=52.1561605555556 +lon_0=5.38763888888889
## +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +units=m +no_defs 
## file        /Users/claudiuforgaci/Projects/geospatial-data-carpentry-tud-2023-06/data/tud-dtm-5m.tif 
## apparent band summary:
##    GDType hasNoDataValue NoDataValue blockSize1 blockSize2
## 1 Float32          FALSE           0          2        722
## apparent band statistics:
##          Bmin       Bmax Bmean Bsd
## 1 -4294967295 4294967295    NA  NA
## Metadata:
## AREA_OR_POINT=Area
GDALinfo(here("data","tud-dsm-5m.tif"))
## rows        386 
## columns     722 
## bands       1 
## lower left origin.x        83565 
## lower left origin.y        445250 
## res.x       5 
## res.y       5 
## ysign       -1 
## oblique.x   0 
## oblique.y   0 
## driver      GTiff 
## projection  +proj=sterea +lat_0=52.1561605555556 +lon_0=5.38763888888889
## +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +units=m +no_defs 
## file        /Users/claudiuforgaci/Projects/geospatial-data-carpentry-tud-2023-06/data/tud-dsm-5m.tif 
## apparent band summary:
##    GDType hasNoDataValue NoDataValue blockSize1 blockSize2
## 1 Float32          FALSE           0          2        722
## apparent band statistics:
##   Bmin Bmax Bmean Bsd
## 1    0    0     0   0
## Metadata:
## AREA_OR_POINT=Area
 ggplot() +
      geom_raster(data = DTM_TUD_df , 
              aes(x = x, y = y, fill = tud.dtm.5m)) +
     scale_fill_gradientn(name = "Elevation", colors = terrain.colors(10)) + 
     coord_quickmap()

 ggplot() +
      geom_raster(data = DSM_TUD_df , 
              aes(x = x, y = y, fill = tud.dsm.5m)) +
     scale_fill_gradientn(name = "Elevation", colors = terrain.colors(10)) + 
     coord_quickmap()

Raster math and Canopy Height Models

CHM_TUD <- DSM_TUD - DTM_TUD

CHM_TUD_df <- as.data.frame(CHM_TUD, xy = TRUE)
 ggplot() +
   geom_raster(data = CHM_TUD_df , 
               aes(x = x, y = y, fill = layer)) + 
   scale_fill_gradientn(name = "Canopy Height", colors = terrain.colors(10)) + 
   coord_quickmap()

ggplot(CHM_TUD_df) +
    geom_histogram(aes(layer))

Challenge 5 (5 minutes)

It’s often a good idea to explore the range of values in a raster dataset just like we might explore a dataset that we collected in the field.

  1. What is the min and maximum value for the Canopy Height Model CHM_TUD that we just created?
  2. What are two ways you can check this range of data for CHM_TUD?
  3. What is the distribution of all the pixel values in the CHM?
  4. Plot a histogram with 6 bins instead of the default and change the color of the histogram.
  5. Plot the CHM_TUD raster using breaks that make sense for the data. Include an appropriate color palette for the data, plot title and no axes ticks / labels.
min(CHM_TUD_df$layer, na.rm = TRUE)
## [1] -3.638057
max(CHM_TUD_df$layer, na.rm = TRUE)
## [1] 92.08102
ggplot(CHM_TUD_df) +
    geom_histogram(aes(layer))

ggplot(CHM_TUD_df) +
    geom_histogram(aes(layer), colour="black", 
                   fill="darkgreen", bins = 6)

custom_bins <- c(0, 10, 20, 30, 100)
CHM_TUD_df <- CHM_TUD_df %>%
                  mutate(canopy_discrete = cut(layer, breaks = custom_bins))

ggplot() +
  geom_raster(data = CHM_TUD_df , aes(x = x, y = y,
                                       fill = canopy_discrete)) + 
     scale_fill_manual(values = terrain.colors(4)) + 
     coord_quickmap()

Export a GeoTIFF

writeRaster(CHM_TUD, here("fig_output","CHM_TUD.tiff"),
            format="GTiff",
            overwrite=TRUE,
            NAflag=-9999)

Work with multi-band rasters

Getting Started with Multi-Band Data in R

In this episode, the multi-band data that we are working with is high resolution imagery for the Netherlands.

The raster() function only reads in the first band, in this case the red band of an RGB raster.

RGB_band1_TUD <- raster(here("data","tudlib-rgb.tif"))
RGB_band1_TUD_df  <- as.data.frame(RGB_band1_TUD, xy = TRUE)
ggplot() +
  geom_raster(data = RGB_band1_TUD_df,
              aes(x = x, y = y, alpha = tudlib.rgb_1)) + 
  coord_quickmap()  # use `coord_equal()` instead

RGB_band1_TUD
## class      : RasterLayer 
## band       : 1  (of  3  bands)
## dimensions : 4988, 4866, 24271608  (nrow, ncol, ncell)
## resolution : 0.08, 0.08  (x, y)
## extent     : 85272, 85661.28, 446295.2, 446694.2  (xmin, xmax, ymin, ymax)
## crs        : +proj=sterea +lat_0=52.1561605555556 +lon_0=5.38763888888889 +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +units=m +no_defs 
## source     : tudlib-rgb.tif 
## names      : tudlib.rgb_1
nbands(RGB_band1_TUD)
## [1] 3

To import the green band:

RGB_band2_TUD <- raster(here("data","tudlib-rgb.tif"), band = 2)
RGB_band2_TUD_df <- as.data.frame(RGB_band2_TUD, xy = TRUE)
ggplot() +
  geom_raster(data = RGB_band2_TUD_df,
              aes(x = x, y = y, alpha = tudlib.rgb_1)) + 
  coord_equal() 

Raster stacks

There is a better way of reading in all bands. The stack() function brings in all bands

RGB_stack_TUD <- stack(here("data","tudlib-rgb.tif"))
RGB_stack_TUD
## class      : RasterStack 
## dimensions : 4988, 4866, 24271608, 3  (nrow, ncol, ncell, nlayers)
## resolution : 0.08, 0.08  (x, y)
## extent     : 85272, 85661.28, 446295.2, 446694.2  (xmin, xmax, ymin, ymax)
## crs        : +proj=sterea +lat_0=52.1561605555556 +lon_0=5.38763888888889 +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +units=m +no_defs 
## names      : tudlib.rgb_1, tudlib.rgb_2, tudlib.rgb_3
RGB_stack_TUD@layers
## [[1]]
## class      : RasterLayer 
## band       : 1  (of  3  bands)
## dimensions : 4988, 4866, 24271608  (nrow, ncol, ncell)
## resolution : 0.08, 0.08  (x, y)
## extent     : 85272, 85661.28, 446295.2, 446694.2  (xmin, xmax, ymin, ymax)
## crs        : +proj=sterea +lat_0=52.1561605555556 +lon_0=5.38763888888889 +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +units=m +no_defs 
## source     : tudlib-rgb.tif 
## names      : tudlib.rgb_1 
## 
## 
## [[2]]
## class      : RasterLayer 
## band       : 2  (of  3  bands)
## dimensions : 4988, 4866, 24271608  (nrow, ncol, ncell)
## resolution : 0.08, 0.08  (x, y)
## extent     : 85272, 85661.28, 446295.2, 446694.2  (xmin, xmax, ymin, ymax)
## crs        : +proj=sterea +lat_0=52.1561605555556 +lon_0=5.38763888888889 +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +units=m +no_defs 
## source     : tudlib-rgb.tif 
## names      : tudlib.rgb_2 
## 
## 
## [[3]]
## class      : RasterLayer 
## band       : 3  (of  3  bands)
## dimensions : 4988, 4866, 24271608  (nrow, ncol, ncell)
## resolution : 0.08, 0.08  (x, y)
## extent     : 85272, 85661.28, 446295.2, 446694.2  (xmin, xmax, ymin, ymax)
## crs        : +proj=sterea +lat_0=52.1561605555556 +lon_0=5.38763888888889 +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +units=m +no_defs 
## source     : tudlib-rgb.tif 
## names      : tudlib.rgb_3
RGB_stack_TUD[[2]]
## class      : RasterLayer 
## band       : 2  (of  3  bands)
## dimensions : 4988, 4866, 24271608  (nrow, ncol, ncell)
## resolution : 0.08, 0.08  (x, y)
## extent     : 85272, 85661.28, 446295.2, 446694.2  (xmin, xmax, ymin, ymax)
## crs        : +proj=sterea +lat_0=52.1561605555556 +lon_0=5.38763888888889 +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +units=m +no_defs 
## source     : tudlib-rgb.tif 
## names      : tudlib.rgb_2
RGB_stack_TUD_df  <- as.data.frame(RGB_stack_TUD, xy = TRUE)
str(RGB_stack_TUD_df)
## 'data.frame':    24271608 obs. of  5 variables:
##  $ x           : num  85272 85272 85272 85272 85272 ...
##  $ y           : num  446694 446694 446694 446694 446694 ...
##  $ tudlib.rgb_1: num  52 48 47 49 47 45 47 48 49 54 ...
##  $ tudlib.rgb_2: num  64 58 57 60 57 55 58 59 62 69 ...
##  $ tudlib.rgb_3: num  57 49 49 53 50 47 51 54 57 68 ...
ggplot() +
  geom_histogram(data = RGB_stack_TUD_df, aes(tudlib.rgb_1))

ggplot() +
  geom_raster(data = RGB_stack_TUD_df,
              aes(x = x, y = y, alpha = tudlib.rgb_2)) + 
  coord_equal()

Create a three-band image

To create an RGB image, we will use the plotRGB function from the raster package.

plotRGB(RGB_stack_TUD,
        r = 1, g = 2, b = 3)

The image above looks pretty good. We can explore whether applying a stretch to the image might improve clarity and contrast using stretch="lin" or stretch="hist".

plotRGB(RGB_stack_TUD,
        r = 1, g = 2, b = 3,
        scale = 800,
        stretch = "lin")

plotRGB(RGB_stack_TUD,
        r = 1, g = 2, b = 3,
        scale = 800,
        stretch = "hist")

RasterStack vs. RasterBrick

The R RasterStack and RasterBrick object types can both store multiple bands. However, how they store each band is different. The bands in a RasterStack are stored as links to raster data that is located somewhere on our computer. A RasterBrick contains all of the objects stored within the actual R object. In most cases, we can work with a RasterBrick in the same way we might work with a RasterStack. However a RasterBrick is often more efficient and faster to process - which is important when working with larger files.

object.size(RGB_stack_TUD)
## 56712 bytes
RGB_brick_TUD <- brick(RGB_stack_TUD)

object.size(RGB_brick_TUD)
## 17128 bytes
plotRGB(RGB_brick_TUD)